Parallel SageAttention Inference #50

DefTruth · 2024-11-26T04:15:53Z

This PR add a small workaround that can make sage attention work compatible with distributed env, for example, xDiT which will launch by torchrun. Without this workaround, sage attention will run into illegal memory access error after first inference step in distributed env for multi gpus inference. This small workaround also make sage attention work compatible with torch.compile through non-fullgraph compile mode.

@jason-huang03

jason-huang03 · 2024-11-26T11:55:52Z

Thanks a lot! We well check the implementation and merge the PR.

DefTruth · 2024-11-26T13:47:52Z

may need to install latest xDiT from source if your env already have FA>=2.7.0, i just make a hotfix to ensure ring flash attn forward compatible with lastest FA and thus will not run into an func launch error.

also, the plug-and-play sage attention can only work with CFG parallelism for 2 GPUs now.

jason-huang03 · 2024-11-26T15:51:07Z

I am busy these days and I will dive into it as soon as I can.

* relax nvcc version check for sm_89 * Update setup.py * workaround for distributed inference * Create parallel_sageattn_cogvideo.py * Create run_parallel.sh * Update README.md * Update core.py * Update run_parallel.sh * Update run_parallel.sh * Update core.py * Update README.md * Update core.py * Update core.py

DefTruth added 9 commits November 25, 2024 14:38

relax nvcc version check for sm_89

6c045de

Update setup.py

525b7e9

workaround for distributed inference

f5b0650

Create parallel_sageattn_cogvideo.py

d5f5d2b

Create run_parallel.sh

03d6192

Update README.md

0f4aa9a

Update core.py

810a449

Update run_parallel.sh

1edb880

Update run_parallel.sh

0ff690c

DefTruth changed the title ~~Parallel SageAttention Inference Support~~ Parallel SageAttention Inference Nov 26, 2024

DefTruth added 2 commits November 26, 2024 14:12

Update core.py

2a5a2ac

Update README.md

898aa61

jason-huang03 self-assigned this Nov 26, 2024

DefTruth added 2 commits November 28, 2024 15:57

Update core.py

6972d5a

Update core.py

61c7eb2

This was referenced Dec 2, 2024

LLM acc problem #55

Closed

Launch error at 4090 for sageattn_qk_int8_pv_fp8_cuda #61

Open

jason-huang03 merged commit 552e23b into thu-ml:main Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel SageAttention Inference #50

Parallel SageAttention Inference #50

DefTruth commented Nov 26, 2024 •

edited

Loading

jason-huang03 commented Nov 26, 2024

DefTruth commented Nov 26, 2024 •

edited

Loading

jason-huang03 commented Nov 26, 2024

Parallel SageAttention Inference #50

Parallel SageAttention Inference #50

Conversation

DefTruth commented Nov 26, 2024 • edited Loading

jason-huang03 commented Nov 26, 2024

DefTruth commented Nov 26, 2024 • edited Loading

jason-huang03 commented Nov 26, 2024

DefTruth commented Nov 26, 2024 •

edited

Loading

DefTruth commented Nov 26, 2024 •

edited

Loading